本文提出了一种新的方法,使用未标记的语音数据进行无标记的神经网络(RNN) - 转换器(RNN-T)端到端(E2E)自动语音识别(ASR)系统进行无监督的微调和自我训练。传统系统使用未标记的音频数据时,使用ASR假设作为目标进行微调/自我训练,并且容易受到基本模型的ASR性能的影响。在这里,为了减轻使用未标记数据时ASR误差的影响,我们提出了多种假设的RNN-T损失,该损失将多个ASR 1最佳假设纳入损失函数中。对于微调任务,在LibrisPeech上进行的ASR实验表明,与test_other设置相比,与单类假设方法相比,多重肢体方法的相对降低可相对降低14.2%的单词错误率(WER)。对于自训练任务,使用来自华尔街日报(WSJ),Aurora-4的监督数据以及Chime-4真实嘈杂数据作为未标记的数据,对ASR模型进行了培训。与单障碍方法相比,多种假设方法在Chime-4的单渠道真实噪声评估集上相对减少了3.3%。
translated by 谷歌翻译
在本文中,我们探索了一个改进的框架,以训练单腔神经增强模型,以识别强大的语音识别。设计的训练框架扩展了现有的混合训练标准,以利用未配对的干净语音和真实的嘈杂数据。发现未配对的干净言语对于提高实际嘈杂言论的分离语音质量至关重要。所提出的方法还对处理和未加工的信号进行混合,以减轻处理工件。单渠道Chime-3真实测试集上的实验表明,在语音识别性能方面,对在不匹配的模拟数据上训练的增强系统的语音识别性能以有监督的方式或以不受欢迎的方式对匹配的真实数据进行了显着改善。与未经处理的信号相比,使用端到端和混合声模型在未经扭曲的数据进行重新纠正的情况下,该系统已实现了16%至39%的相对减少。
translated by 谷歌翻译
可以处理各种扬声器和声学条件的模型在语音情感识别(Ser)中至关重要。通常,这些模型往往会在培训期间呈现扬声器或声学条件时显示混合结果。本文调查了交叉组件数据互补和数据增强对Ser模型的影响(从相同的语料库中的测试设置)和不匹配(从不同的语料库测试)条件。介绍了使用六种情绪语音集团的调查,其中包括单一和多个扬声器以及情感风格的变化(作用,引发,自然)和记录条件。观察结果表明,正如预期的那样,在单一语料库上培训的模型在匹配条件下表现最佳,而性能在不匹配的条件下减少10-40%,具体取决于语料库特定功能。在混合语料库上培训的型号在不匹配的上下文中可以更稳定,与匹配条件中的单个语料库模型相比,性能减少的范围为1%至8%。数据增强产生额外的收益高达4%,似乎有利于比匹配的不匹配条件。
translated by 谷歌翻译
We address the problem of extracting key steps from unlabeled procedural videos, motivated by the potential of Augmented Reality (AR) headsets to revolutionize job training and performance. We decompose the problem into two steps: representation learning and key steps extraction. We employ self-supervised representation learning via a training strategy that adapts off-the-shelf video features using a temporal module. Training implements self-supervised learning losses involving multiple cues such as appearance, motion and pose trajectories extracted from videos to learn generalizable representations. Our method extracts key steps via a tunable algorithm that clusters the representations extracted from procedural videos. We quantitatively evaluate our approach with key step localization and also demonstrate the effectiveness of the extracted representations on related downstream tasks like phase classification. Qualitative results demonstrate that the extracted key steps are meaningful to succinctly represent the procedural tasks.
translated by 谷歌翻译
Person recognition at a distance entails recognizing the identity of an individual appearing in images or videos collected by long-range imaging systems such as drones or surveillance cameras. Despite recent advances in deep convolutional neural networks (DCNNs), this remains challenging. Images or videos collected by long-range cameras often suffer from atmospheric turbulence, blur, low-resolution, unconstrained poses, and poor illumination. In this paper, we provide a brief survey of recent advances in person recognition at a distance. In particular, we review recent work in multi-spectral face verification, person re-identification, and gait-based analysis techniques. Furthermore, we discuss the merits and drawbacks of existing approaches and identify important, yet under explored challenges for deploying remote person recognition systems in-the-wild.
translated by 谷歌翻译
We investigate the asymptotic properties of deep Residual networks (ResNets) as the number of layers increases. We first show the existence of scaling regimes for trained weights markedly different from those implicitly assumed in the neural ODE literature. We study the convergence of the hidden state dynamics in these scaling regimes, showing that one may obtain an ODE, a stochastic differential equation (SDE) or neither of these. In particular, our findings point to the existence of a diffusive regime in which the deep network limit is described by a class of stochastic differential equations (SDEs). Finally, we derive the corresponding scaling limits for the backpropagation dynamics.
translated by 谷歌翻译
We address the problem of few-shot classification where the goal is to learn a classifier from a limited set of samples. While data-driven learning is shown to be effective in various applications, learning from less data still remains challenging. To address this challenge, existing approaches consider various data augmentation techniques for increasing the number of training samples. Pseudo-labeling is commonly used in a few-shot setup, where approximate labels are estimated for a large set of unlabeled images. We propose DiffAlign which focuses on generating images from class labels. Specifically, we leverage the recent success of the generative models (e.g., DALL-E and diffusion models) that can generate realistic images from texts. However, naive learning on synthetic images is not adequate due to the domain gap between real and synthetic images. Thus, we employ a maximum mean discrepancy (MMD) loss to align the synthetic images to the real images minimizing the domain gap. We evaluate our method on the standard few-shot classification benchmarks: CIFAR-FS, FC100, miniImageNet, tieredImageNet and a cross-domain few-shot classification benchmark: miniImageNet to CUB. The proposed approach significantly outperforms the stateof-the-art in both 5-shot and 1-shot setups on these benchmarks. Our approach is also shown to be effective in the zero-shot classification setup
translated by 谷歌翻译
我们提出了逐渐变化的辐射场(PDRF),这是一种从模糊图像中有效重建高质量辐射场的新方法。虽然当前的最先进的(SOTA)场景重建方法实现了光真实的渲染,因此清洁源视图会导致其性能在源视图受模糊影响的影响时会受到影响,这通常是野外图像的观察。以前的脱毛方法要么不考虑3D几何形状,要么是计算强度。为了解决这些问题,PDRF是Radiance Field建模中逐渐消除的方案,通过合并3D场景上下文来准确地模拟模糊。 PDRF进一步使用了有效的重要性采样方案,从而导致快速场景优化。具体而言,PDRF提出了一个粗射线渲染器,以快速估计体素密度和特征。然后,使用精细的体素渲染器来实现高质量的射线追踪。我们执行广泛的实验,并表明PDRF比以前的SOTA快15倍,同时在合成场景和真实场景上都取得更好的性能。
translated by 谷歌翻译
在呼吸运动下重建肺部锥体束计算机断层扫描(CBCT)是一个长期的挑战。这项工作更进一步,以解决一个具有挑战性的设置,以重建仅来自单个} 3D CBCT采集的多相肺图像。为此,我们介绍了对观点或Regas的概述综合。 Regas提出了一种自我监督的方法,以合成不足的层析成像视图并减轻重建图像中的混叠伪像。该方法可以更好地估计相间变形矢量场(DVF),这些矢量场(DVF)用于增强无合成的直接观察结果的重建质量。为了解决高分辨率4D数据上深神经网络的庞大记忆成本,Regas引入了一种新颖的射线路径变换(RPT),该射线路径转换(RPT)允许分布式,可区分的远期投影。 REGA不需要其他量度尺寸,例如先前的扫描,空气流量或呼吸速度。我们的广泛实验表明,REGA在定量指标和视觉质量方面的表现明显优于可比的方法。
translated by 谷歌翻译
无监督和半监督的ML方法,例如变异自动编码器(VAE),由于其在分离的表述方面的能力以及找到具有复杂实验数据的潜在分类和回归的能力,因此在多个物理,化学和材料科学方面已广泛采用。 。像其他ML问题一样,VAE需要高参数调整,例如,平衡Kullback Leibler(KL)和重建项。但是,训练过程以及由此产生的歧管拓扑和连通性不仅取决于超参数,还取决于训练过程中的演变。由于在高维超参数空间中详尽搜索的效率低下,因此我们在这里探索了一种潜在的贝叶斯优化方法(ZBO)方法,用于用于无监督和半监测的ML的超参数轨迹优化,并证明了连接的ML,并证明VAE具有旋转不变。我们证明了这种方法的应用,用于寻找血浆纳米颗粒材料系统的MNIST和实验数据的联合离散和连续旋转不变表示。已广泛讨论了所提出的方法的性能,它允许对其他ML模型进行任何高维超参数调整或轨迹优化。
translated by 谷歌翻译